Mitochondrial genome features and systematic evolution of diospyros kaki thunb 'Taishuu'

Background 'Taishuu' has a crisp texture, abundant juice, and sweet flavor with hints of cantaloupe. The availability of mitochondrial genome data of Diospyros species is far from the known number of species. Results The sequencing data were assembled into a closed circular mitochondrial chromosome with a 421,308 bp length and a 45.79% GC content. The mitochondrial genome comprised 40 protein-coding, 24 tRNA, and three rRNA genes. The most common codons for arginine (Arg), proline (Pro), glycine (Gly), tryptophan (Trp), valine (Val), alanine (Ala), and leucine (Leu) were AGA, CCA, GGA, UGG, GUA, GCA, and CUA, respectively. The start codon for cox1 and nad4L protein-coding genes was ACG (ATG), whereas the remaining protein-coding genes started with ATG. There are four types of stop codons: CGA, TAA, TAG, and TGA, with TAA being the most frequently used stop codon (45.24%). In the D. kaki Thunb. 'Taishuu' mitochondrial genome, a total of 645 repeat sequences were identified, including 125 SSRs, 7 tandem repeats, and 513 dispersed repeats. Collinearity analysis revealed a close relationship between D. kaki Thunb. 'Taishuu' and Diospyros oleifera, with conserved homologous gene fragments shared among these species in large regions of the mitochondrial genome. The protein-coding genes ccmB and nad4L were observed to undergo positive selection. Analysis of homologous sequences between chloroplasts and mitochondria identified 28 homologous segments, with a total length of 24,075 bp, accounting for 5.71% of the mitochondrial genome. These homologous segments contain 8 annotated genes, including 6 tRNA genes and 2 protein-coding genes (rrn18 and ccmC). There are 23 homologous genes between chloroplasts and nuclei. Mitochondria, chloroplasts, and nuclei share two homologous genes, which are trnV-GAC and trnW-CCA. Conclusion In conclusion, a high-quality chromosome-level draft genome for D. kaki was generated in this study, which will contribute to further studies of major economic traits in the genus Diospyros. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-024-10199-0.


Introduction
Diospyros Linn is a genus in the family Ebenaceae under the class of dicotyledonous plants.It comprises approximately 500 species worldwide and is widely distributed in tropical and subtropical regions [1].Diospyros kaki Thunb., commonly known as persimmon, can be classified into two major categories based on the genetic traits linked to the natural de-astringency of its fruits: pollination constant & non-astringent (PCNA) and non-pollination constant & non-astringent (non-PCNA).Based on the regulation of different gene loci, PCNA can be further categorized into Japanese PCNA (JPCNA) and Chinese PCNA (CPCNA), whereas non-PCNA can be sub-classified into pollination-variant non-astringent (PVNA), pollination-constant astringent (PCA), and pollination-variant astringent (PVA).Persimmons are highly valued fruits due to their broad developmental prospects.They naturally lose astringency upon ripening, conferring excellent palatability.They possess high nutritional and health benefits compared to astringent persimmons [2,3].'Taishuu' is a cultivar of PCNA persimmon developed in Japan [4].It has a crisp texture, abundant juice, and sweet flavor with hints of cantaloupe, exhibiting superior quality to other persimmon varieties.After ripening, the fruit turns orange-red and gains an average weight of 200-400 g per fruit.The flesh is delicate and plump, with a long shelf life, facilitating effective storage and transportation (Fig. S1).
Mitochondria, an organelle with independent genetic material in eukaryotic cells, is vital in cellular metabolism, apoptosis, diseases, aging, and other cellular processes.Due to its simple structure, compact arrangement, and low mutation rate, the exploration of the mitochondrial genome has been widely applied in molecular systematics and phylogeography studies [5,6].Compared to chloroplast DNA (cpDNA), plant mitochondrial DNA (mtDNA) is a complex and dynamic structure [6,7].It contains multiple repeat sequences in the non-coding regions, which can cause various recombinant sub-genomic forms of mtDNA [8,9].Therefore, the plant mtDNA map typically represents a circular DNA molecule composed of the entire mtDNA sequence, also known as the master chromosome [10].Several studies have also independently reported existing subgenomic DNA molecules within plant mtDNA maps [8,11,12].Although most studies have represented mtDNA structure as circular, linear structure has also been reported in some plants.For instance, maize S-type cytoplasmic male sterility (CMS) mtDNA primarily exists as multi-linear molecules [13], whereas the maintainer line mtDNA in the soybeantrisomic hybrid system exists in linear and circular structures simultaneously [12].The plant mtDNA structure complexity is primarily attributed to the presence of various lengths of forward and reverse repeat sequences [14,15].The long repeat sequences (≥ 500 bp) cause frequent homologous recombination, transforming mtDNA into nearly equimolar mixtures, including interchangeable isomeric rings and subgenomic circles.In some plant species, such as Dichanthium annulatum [16], cucumber [8], and sugarcane [11], mtDNA was found to be composed of two or more independent circular DNA molecules, possibly due to shorter repeat sequence-caused abnormal recombination.
In addition to providing rich molecular data, the study of sweet persimmons' mitochondrial genome is conducive to exploring genetic evolution in the Diospyros genus.However, the availability of mitochondrial genome data of Diospyros species is far from the known number of species, and their mtDNAs are primarily identified in small fragments, resulting in a shortage of molecular information.Currently, Diospyros oleifera mtDNA is the only publicly available complete mitochondrial genome belonging to the Diospyros [17], which limits the in-depth molecular research on Diospyros species.The current study analyzed the mitochondrial genome of 'Taishuu' for the first time through second-generation and third-generation sequencing technologies, revealing its mitochondrial genome structural characteristics.Further, comparative genome analysis and phylogenetic tree construction were performed to evaluate the evolutionary relationships.These results can provide a foundation for studying the systematics and population evolution of 'Taishuu' and present molecular data for taxonomic research of the Diospyros.
The start codon for the protein-coding genes cox1 and nad4L was ACG(ATG), whereas it was ATG for the remaining protein-coding genes.Four types of termination codons, including CGA, TAA, TAG, and TGA, were used at different frequencies of 4.76%, 45.24%, 16.67%, and 33.33%, respectively.TAA was the most frequently used stop codon (Tab.S2).
Previous studies suggested that the mt genomes of most terrestrial plants contain three rRNA genes [18].The three rRNA genes, rrn18, rrn26, and rrn5, were annotated in the 'Taishuu' persimmon's mitochondrial genome of 1903 bp, 3382 bp, and 121 bp, respectively.In addition, 24 different tRNAs were identified in the 'Taishuu' mt genome, transporting 17 different amino acids.These findings explain that the same amino acid can be transported to different codons by two or more tRNAs (Tab.S2).
In the 'Taishuu' mitochondrial genome, 645 repeat sequences with lengths equal to or greater than 30 bp Detected SSR loci included mono-, di-, tri-, tetra-, penta-, and hexanucleotide repeats, with the three most common being A/T (25.6%),AG/CT (16.8%), and AAAG/CTTT (10.4%).The distribution of these repeat sequences on the genomic map is shown in Fig. 3A.Interspersed repeats (IRs) were dispersed across the genome; among these, 238 were direct repeats, whereas 275 were palindromic repeats (Fig. 3B).The longest direct repeat was 463 bp, and the longest palindromic repeat was 194 bp.The length distribution for direct and palindromic repeats is shown in Fig. 3C.Both types of repeats were most abundant within the 30 ~ 39 bp range.

Comparative analysis of mitochondrial genomes
To further determine the gene rearrangements between 'Taishuu' persimmon and closely related varieties, a collinearity analysis was performed using the mitochondrial gene sequences of six closely related plant species available in NCBI (Fig. 4).The results showed that 'Taishuu' shared the closest relationship with Diospyros oleifera.Homologous gene fragments common in these species were relatively conserved and occupied most of the mitochondrial genome regions.The other homologous gene clusters might have undergone varying degrees of rearrangement or loss, reflecting that the mitochondria genomes of these species exhibit a general conservation trend with local dynamic changes during evolution.Based on the annotation results, we observed that the conserved regions primarily included protein-coding genes, while the variable regions primarily included intergenic regions.It can be speculated that the insertion or loss of unknown sequences in the intergenic region during the evolution of the mitochondrial genome might have led to rearrangement.
Next, the mitochondrial gene sequences were compared among Diospyros species, using the rps7 gene as the starting point.The arrangement of genes in the mitochondrial genomes of two Diospyros species is shown in Fig. 5.The arrangement of protein-coding genes in 'Taishuu' and Diospyros oleifera was consistent, with minor differences, whereas nad2ab and atp9 were inverted.Minor gene inversion losses were observed, leading to differences in gene numbers.

Evolutionary and phylogenetic analysis of mitochondrial protein-coding genes
The amino acid changes caused by base mutation are known as nonsynonymous; otherwise, it is termed synonymous mutation.Nonsynonymous mutations are A Ka/Ks ratio greater than 1 indicates a positive selection effect, whereas a ratio less than 1 suggests a purifying selection effect.We aligned 38 protein-coding genes in the 'Taishuu' mitochondrial genome with the mt genomes of six other species and analyzed them using Ka/Ks values (Tab.S7).As shown in Fig. 6, 35 genes were negatively selected during evolution, indicating that most of the protein-coding genes in the mt genome are relatively conserved.The Ka/Ks values of the protein-coding genes ccmB and nad4L were 1.26 and 1.07, respectively (greater than 1), indicating that these two genes might have undergone positive selection.
Phylogenetic analysis based on mitochondrial genes showed (Fig. 7) that 'Taishuu' and D. oleifera, two species from the Ebenaceae family, were clustered together, indicating their closer evolutionary relationship.'Taishuu' was also closely located with Aegiceras corniculatum and Rhododendron simsii, belonging to the order Ericales, suggesting that mitochondrial protein-coding genes are desirable materials to unravel phylogenetic relationships among different plant species.

Discussion
In the current study, the second and third-generation sequencing technologies were utilized to assemble the 'Taishuu' persimmon mitochondrial genome for the first time.The mitochondrial genome of 'Taishuu' persimmon was found to be circular, consistent with the previous reports that most plant mitochondrial genomes are circular [18].The size of of 'Taishuu' persimmon mitochondrial genome is 421,308 bp with 45.79% GC content, slightly higher than that of Oryza sativa (43.8%) [19], Zea mays (43.9%) [20], Hibiscus cannabinus (44.9%) [21], and Gossypium raimondii (44.95%) [22] and equivalent to Amborella trichopoda (45.9%) [23].This places it on the higher end regarding GC content among higher plants.Some genes contain one or more introns, possibly critical in gene expression regulation.Most terrestrial plants contain three rRNA genes [18].These three rRNA genes, rrn18, rrn26, and rrn5, were annotated in the mitochondrial genome of 'Taishuu' persimmon.
In the Diospyros mitochondrial genome, the proteincoding region only accounted for 7.82% of the total length, whereas non-coding regions accounted for over 92%.The functional classification of protein-coding genes in the Diospyros mitochondrial genome is similar.The coding region of the genome was more conserved than the non-coding region, primarily responsible for the differences in the Diospyros mitochondrial genome [24].The intergenic regions of the mitochondrial genome mainly consisted of repeat sequences, homologous sequences from the chloroplast genome, and homologous sequences from the nuclear genome.The repeat sequences included tandem sequences.Short and long repeat sequences are widely found in the mitochondrial genome [25].They are essential for molecular recombination of the mitochondrial genome and are usually considered the major contributor to differences in plant mitochondrial genomes [23].Most protein-coding genes in the Diospyros genus mitochondrial genome were conserved during the long evolution process.The studies of the mitochondrial genome provide sufficient molecular marker sites for the systematic evolution of this genus.The only mitochondrial information for the Diospyros genus currently available on the NCBI website is for Diospyros oleifera.Our phylogenetic results showed that D._kaki and D. oleifera were localized in the same branch, consistent with the previous mitochondrial genome collinearity analysis, revealing a close evolutionary relationship.
Plant mitochondrial genomes always contain sequences transferred from chloroplast genomes, usually accounting for 1-12% of the total length [26].Nearly one-third of tRNA genes originate from chloroplasts and have gradually migrated during evolution [27].A previous study on higher plants has shown that approximately 42% of the chloroplast genome fragments have been integrated into the 773,279 bp grape (Vitis vinifera) mitochondrial genome, including more than thirty chloroplast proteincoding genes and 17 tRNA genes [28].Furthermore, in the 982,833 bp zucchini (Cucurbita pepo) mitochondrial genome, more than 113 kb chloroplast genome fragments were identified [29], whereas the chloroplast sequences in rice accounted for 6.2% of the genome [19].Our study identified 24075 bp of homologous sequences in the 'Taishuu' persimmon mitochondrial and chloroplast genomes, accounting for approximately 5.71% of the mitochondrial genome.These fragments may be critical during evolution.
DNA migration occurs slowly during evolution.During the transfer process, chloroplast genome fragment often carries some chloroplast protein-coding genes into the mitochondrial genome; however, these genes lose their integrity and become pseudogenes following incorporation into the mitochondrial genome, possibly due to genomic sequence recombination [30].Our analysis of the 'Taishuu' persimmon mitochondrial genome for chloroplast migration sequences depicted the same conclusion.In contrast, the non-protein coding genes function normally after being transferred into the mitochondrial genome.Complete structures of 24 different tRNA genes were observed in 'Taishuu' persimmon, which might have normal transport functions, indicating that tRNA genes are more conserved than protein-coding genes in the mitochondrial genome.This characteristic may be unique to higher plant mitochondria during evolution.Currently, the mechanisms and expression patterns of sequence migration between 'Taishuu' persimmon genomes are unknown.Therefore, further perfection of the 'Taishuu' persimmon whole genome project in this study will help address these research gaps.
For the first time, we applied mitochondrial wholegenome sequences to the evolutionary analysis of 'Taishuu' persimmon.A phylogenetic relationship analysis was conducted based on the publicly available mitochondrial genome of one species from the Ebenaceae family and 20 other published plant mitochondrial genome sequences.The results showed clear taxonomic distinctions of each species.'Taishuu' and D. oleifera, belonging to the Ebenaceae family forming one cluster.Nevertheless, consistent with the biological classification, they were also phylogenetically close to Aegiceras corniculatum and Rhododendron simsii, belonging to the order Ericales.The current study on the mitochondrial genome represents only the tip of the iceberg, and additional research must be carried out to obtain accurate conclusions concerning Diospyros genus mitochondrial genome.In addition, this report verifies that mitochondrial genome sequences have certain advantages in

Fig. 1 '
Fig. 1 'Taishuu' Mitochondrial Genome Map.Genes encoded in the forward direction are located outside the circle, whereas those encoded in reverse are located inside.The inner gray circle represents GC content.In the linear presentation, genes encoded in the forward direction are above the circle, while those in the reverse are below

Fig. 2 Fig. 3 Fig. 4
Fig. 2 RSCU Bar Chart.The below squares represent all codons encoding each amino acid, while the height of the bars above represents the total RSCU value of all codons

Fig. 5 Fig. 6
Fig. 5 Schematic representation of the order of mtDNA genes In Diospyros kaki and Diospyros oleifera

Fig. 7
Fig.7Mitochondrial Evolutionary Analysis.Evolutionary branch length, also known as genetic variation or evolutionary distance, represents the degree of branch variation; the shorter branch length represents a smaller difference and closer evolutionary distance.Distance scale: unit length of numerical differences between organisms or sequences, equivalent to the scale of the evolutionary tree.Bootstrap value is marked at the node position and used to evaluate the credibility of the branch

Fig. 8
Fig. 8 Homologous fragment analysis of mitochondrial genome, chloroplast genome, and nuclear genome.Chloroplast indicates chloroplast sequences, while the others indicate mitochondrial sequences.The genes from the same complex are marked with the same color, and the connecting line indicates homologous sequences